Assuming we have dataa about suicide rates in a given country, and we would wish to establish a relationship between suicide rates and GDP per capita (Living standards of people).

The following visualization helps explain the relationship more precisely:

It is evident that a linear relationship exists and the model is given as shown below:

## 
## Call:
## lm(formula = y ~ x, data = dd)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.232915 -0.103653 -0.002237  0.105068  0.276349 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.00288    0.05555  90.056  < 2e-16 ***
## x            0.37469    0.08737   4.288 5.23e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1338 on 76 degrees of freedom
## Multiple R-squared:  0.1948, Adjusted R-squared:  0.1842 
## F-statistic: 18.39 on 1 and 76 DF,  p-value: 5.23e-05

Assuming we want to use the model to predict the suicide rates in a certain region with GDP per capita of 0.63, we will use the regression model as follows:

Suicide rates = 5.00288(the slope) + 0.37469(the intercept) Thus Suicide rates = 5.00288(the slope) + 0.37469(0.63) = 5.2389347

Then the suicide rates reported is 5.2389347 ignore the numbers

Suppose we want to add noise to the final output (that is, we add noise in order to account for the unexplained variation by regression model).

We will first start by calculating the distance from this predicted point, to all the other points in the dataset, as shown:

We may for example be only interested in the noise adding from only five of its nearest neighbours, so, just to zoom in and check the 5 nearest neighbors:

Once we get the five nearest neighbors, we then find their residuals or the error terms explain this using slide further

We then average the residuals and add the to the final output of the regression model as shown:

We average the residuals as follows:

## [1] "The residuals:  0.0884611517391702"  "The residuals:  -0.0265965873524463"
## [3] "The residuals:  -0.128730356960356"  "The residuals:  0.178158532795457"  
## [5] "The residuals:  -0.187983085773569"
## [1] "*********************************************"
## [1] "their average is:  -0.0153380691103486"

We now add their average to the predicted value by the regression model which was:

## [1] "The final value is 5.22359663088965"

PA PA PA PA